Recurrent Neural Network-Based Phoneme Sequence Estimation Using Multiple ASR Systems' Outputs for Spoken Term Detection
نویسندگان
چکیده
This paper describes a novel correct phoneme sequence estimation method that uses a recurrent neural network (RNN)-based framework for spoken term detection (STD). In an automatic speech recognition (ASR)-based STD framework, ASR performance (word or subword error rate) affects STD performance. Therefore, it is important to reduce ASR errors to obtain good STD results. In this study, we use an RNN-based phoneme estimator, which estimates a correct phoneme sequence of an utterance from some sorts of phoneme-based transcriptions produced by multiple ASR systems in post-processing, to reduce phoneme errors. With two types of test speech corpora, the proposed phoneme estimator obtained phoneme-based N-best transcriptions with fewer phoneme recognition errors than the N-best transcriptions from the best ASR system we prepared. In addition, the STD system with the RNN-based phoneme estimator drastically improved STD performance with two test collections for STD compared to our previously proposed STD system with a conditional random fields-based phoneme estimator.
منابع مشابه
Evaluation of DNN-based Phoneme Estimation Approach on the NTCIR-12 SpokenQuery&Doc-2 SQ-STD Subtask
This paper proposes a correct phoneme sequence estimation method using a deep neural network (DNN)-based framework for spoken term detection (STD). We use a DNN architecture as a correct phoneme estimator. The DNN-based estimator estimates a correct phoneme sequence of an utterance from some sorts of phoneme-based transcriptions produced by multiple ASR systems in post-processing, for reducing ...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملبهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگیهای استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز
The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...
متن کاملDetecting repetitions in spoken dialogue systems using phonetic distances
This paper addresses the problem of automatic detection of repeated turns in Spoken Dialogue Systems. Repetitions can be a symptom of problematic communication between users and systems. Such repetitions are often due to speech recognition errors, which in turn makes it hard to use speech recognition to detect repetitions. We present an approach to detect repetition using the phonetic distance ...
متن کاملTopic Identification for Speech Without ASR
Modern topic identification (topic ID) systems for speech use automatic speech recognition (ASR) to produce speech transcripts, and perform supervised classification on such ASR outputs. However, under resource-limited conditions, the manually transcribed speech required to develop standard ASR systems can be severely limited or unavailable. In this paper, we investigate alternative unsupervise...
متن کامل